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Abstract. According to Brooke [3] “Usability does not exist in any 
absolnte sense; it can only be defined with reference to particnlar con¬ 
texts.” That is, one cannot speak of usability without specifying what 
that particular usability is characterized by. Driven by the feedback of 
a reviewer at an international conference, I explore in which way one 
can precisely specify the kind of usability they are investigating in a 
given setting. Finally, I come up with a formalism that defines usabil¬ 
ity as a quintuple comprising the elements level of usability metrics, 
product, users, goals and context of use. Providing concrete values for 
these elements then constitutes the investigated type of usability. The 
use of this formalism is demonstrated in two case studies. 


1 Introduction 


In 2014, I submitted a research paper about a concept called Usability-based Split 
Testing^ to a web engineering conference [10]. My evaluation involved a ques¬ 
tionnaire that asked for ratings of different factors of usability based on a novel 
usability instrument specifically developed for web interfaces m- This instru¬ 
ment comprises the items informativeness, understandability, confusion, distrac¬ 
tion, readability, information density and reachability, which have been identified 
as factors of usability in a confirmatory factor analysis m- So obviously, I use 
the word “usability” in that paper a lot; however, without having thought of its 
exact connotation in the context of my research before. Of course I was aware of 

^ “Usability-based Split Testing” means comparing two variations of the same web interface 
based on a quantitative usability score (e.g., usability of interface A = 97%, usability of 
interface B = 42%) [T^. The split test can be carried out as a user study or under real-world 
conditions [lO] . 
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the differences compared to User experience (UX; cf. [7]), bnt just assumed that 
the used questionnaire and description of my analyses would make clear what my 
paper understands as usability. 

Then came the reviews and one reviewer noted: 

“There is a weak characterization of what Usability is in the context 
of Web Interface Quality, quality models and views. Usability in this 
paper is a key word. However, it is weakly defined and modeled w.r.t. 
quality.” 

This confused me at hrst since I thought it was pretty clear what usability is and 
that my paper was pretty well understandable in this respect. In particular, I 
thought Usability has already been defined and characterized before, so why does 
this reviewer demand me to characterize it again? Figuratively speaking, they 
asked me: “When you talk about usability, what is that >usabihty<?” 


2 A Definition of Usability 


As I could not just ignore the review, I did some more research on definitions 
of usability. I remembered that Nielsen dehned usability to comprise five quality 
components—Learnability, Efficiency, Memorability, Errors, and Satisfaction [S]. 
Moreover, I had already made use of the definition given in ISO 9241-11 pQ for 
developing the usability questionnaire (cf. [H]) used in my evaluation: 

“The extent to which a product can be used by specified users to achieve 
specihed goals with effectiveness, efficiency and satisfaction in a speci- 
hed context of use.” [1] 

During the design of the questionnaire I had focused only on reflecting the men¬ 
tioned high-level factors of usability—effectiveness, efficiency, and satisfaction—by 
the contained items. However, the rest of the definition is not less interesting. 
Particularly, it contains the phrases 

1. “a product”; 

2. “specihed users”; 

3. “specihed goals”; and 

4. “specihed context of use”. 

As can be seen, the word “specihed” is used three times—and also “a product” is 
a rather vague description here. 

This makes it clear that usability is a difhcult-to-grasp concept and even the ISO 
dehnition [1] gives ample scope for diherent interpretations. Also, in his paper 
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on the System Usability Scale, Brooke [3] refers to ISO 9241-11 and notes that 
“Usability does not exist in any absolute sense; it can only be dehned with ref¬ 
erence to particular contexts.” Thus, one has to explicitly specify the four vague 
phrases mentioned above to characterize the exact manifestation of usability they 
are referring to. Despite my initial skepticism, that reviewer was absolutely right. 

While usability is of course also an attribute of everyday things such as doors or 
coffee machines, in this technical report the fundamental assumption is that we are 
talking about settings that involve interfaces provided by visual displays, which is 
based on and in accordance with ISO 9241-11 [1]. 


3 Levels of Usability Metrics 


As the reviewer explicitly referred to “Web Interface Quality”, we also have to 
take ISO/IEC 25010 [2] (that has replaced ISO/IEC 9126) into account. That 
standard is concerned with software engineering and product quality and, among 
other things, refers to three different levels of quality metrics [2]: 

• Internal metrics, which measure a set of static attributes (e.g., related to 
software architecture and structure). 

• External metrics, which relate to the behavior of a system (i.e., they rely on 
execution of the software). 

• In-use metrics, which involve actual users in a given context of use. 

ISO/IEC 25010 dehnes usability as a subset of quality in use |2], which makes sense 
as “usability” is derived from the word “use” and cannot be meaningfully applied 
to products that are not actually used. Yet, it is possible to draw inferences about 
usability from static attributes and measures that rely on software execution alone. 
Hence, we transfer the three types of metrics above into the context of usability 
evaluation. In analogy, this gives us three levels of usability metrics: Internal 
usability metrics, external usability metrics, and usability in use metrics. 

This means that if we want to evaluate usability, we hrst have to state which of the 
above levels we are investigating. The hrst one (internal usability metrics) might be 
assessed with a static code analysis, as for example carried out by accessibility tools 
that among other things check whether the alt attributes of all images are set on a 
webpage. The second (external usability metrics) might be assessed in terms of an 
expert going through a rendered interface without actually using the product, or 
as is done by jQMetric^. Finally, usability in use metrics are commonly assessed 
with user studies, either on a live website, or in a more controlled setting. 

^https ://github. com/globis-ethz/jqmetrics, retrieved January 22, 2015. 
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4 Bringing it all Together 


Once we have decided for one of the above levels of usability metrics, we have 
to give further detail on the four vague phrases contained in ISO 9241-11 [1]. 
Mathematically speaking, we have to hud concrete values for the elements product, 
users, goals, and context of use, which are sets of characteristics. Together with 
the level of usability metrics, this gives us a quintuple defined by the following 
Cartesian product: 


usability G LEVEL x PRODUCT x USERS x GOALS x CONTEXT 
We already know the possible values for level of usability metrics-. 


level of usability metrics G LEVEL 
LEVEL = {internal, external, in use} 


( 1 ) 


So what are the possible values for the remaining elements contained in the “quin¬ 
tuple of usability” ? 


4.1 Product 

The hrst one is rather straightforward. Product is the actual product you are 
evaluating, or at least the type thereof. Particularly, web interface usability is 
different from desktop software or mobile app usability. Also, it is important to 
state whether one evaluates only a part of an application (e.g., a single webpage 
contained in a larger web app), or the application as a whole. Therefore: 


product C PRODUCT 

PRODUCT = (desktop application, mobile application, web application, (2) 
online shop, WordPress blog, individual web page,...} 

Since product is a subset of the potential values, it is possible to use any number 
of them for a precise characterization of the element. For instance, product = 
(mobile application, WordPress blog} if you are evaluating the mobile version of 
your blog. This should not be thought of as a strict formalism, but is rather in¬ 
tended as a convenient way to express the combined attributes of the element. 
However, not all values can be meaningfully combined (e.g., desktop application 
and WordPress blog). Therefore, the correct definition and usage are the respon¬ 
sibility of the evaluator]^ The same holds for the remaining elements explained in 
the following. 

^In this case, “evaluator” means the person who has to specify the considered type of usability. 
This can also include stakeholders, product owners, developers etc. 
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4.2 Users 


Next comes the element users, which relates to the target group of your product 
(if evaluating in a real-world setting) or the participants involved in a controlled 
usability evaluation (such as a lab study). To distinguish between these is highly 
important since different kinds of users perceive a product completely differently. 
Also, real users (preferably in a real-world setting) are more likely unbiased com¬ 
pared to participants in a usability study. 


users C USERS 

USERS = {visually impaired users, female users, users aged 19-49, 
test participants, inexperienced users, experienced users, novice users, 

frequent users,...} 

In particular, when evaluating usability in a study with participants, this element 
should contain all demographic characteristics of that group. Yet, when using 
methods such as expert inspections (cf. [13]), users should not contain “usability 
experts,” as your interface is most probably not exclusively designed for that very 
specihc group. Rather, it contains the characteristics of the target group the expert 
has in mind when performing, for instance, a cognitive walkthrough (cf. [I2])- This 
is due to the fact that usability experts are usually well-trained in simulating a 
user with specihc attributes. 


4.3 Goals 

The next one is a bit tricky, as goals are not simply the tasks a specihed user shall 
accomplish (such as completing a checkout process). Rather, there are two types 
of goals according to Hassenzahl [6]: do-goals and be-goals. 

Do-goals refer to the pragmatic dimension, which means “the product’s perceived 
ability to support the achievement of [tasks]” [6], as for example the aforementioned 
completion of a checkout process. 

Contrary, be-goals refer to the hedonic dimension, which “calls for a focus on 
the Self” [6]. To give just one example, the ISO 9241-11 [1] dehnition contains 
“satisfaction” as one component of usability. Therefore, “feeling satished” is a 
be-goal that can be achieved by users. The achievement of be-goals must not 
necessarily be connected to the achievement of corresponding do-goals, i.e. do- 
goals are not inevitably a prerequisite [6] . This means that a user can be satished 
even if they failed to accomplish certain tasks and vice versa [6]. 

Thus, it is necessary to take these differences into account when dehning the specihc 
goals to be achieved by a user. The element goals can be specihed either by the 
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concrete tasks the user shall achieve or by Hassenzahl’s [6] more general notions if 
no specihc tasks are dehned: 


goals C GOALS 

GOALS = {do-goals, be-goals, completed checkout process, (4) 

writing a blog post, feeling satished, having fun,...} 

Particularly, the dimensions of usability given by ISO 9241-11 |T]— effectiveness, 
efficiency and satisfaction—can be expressed by elements of the set GOALS', “being 
effective”, “being efficient” and “being satished”. 

For more information about goal-directed design, the interested ready may refer 

to 0. 


4.4 Context of use 

Last comes the element context of use. This one describes the setting in which you 
want to evaluate the usability of your product. In particular, context is strongly 
connected to device-related differences, e.g., a desktop PC vs. a touch device. 
Recently, British newspaper The Guardian reported their website is accessed by 
6000 different types of devices per month0 However, it is not sufficient to dehne 
context only by the device used. It also contains more general information about 
the setting—such as “real world” or “lab study” to indicate a potential bias of the 
users involved—, user-related properties and other more specihc information. For 
instance, if you are evaluating the usability of a location-based service, your context 
most probably includes mobile devices that are used outside, i.e. with a potentially 
higher noise level than at home, suboptimal light conditions and a potentially weak 
signal strength. In 0, Dey dehnes context as follows: 

“Context is any information that can be used to characterize the sit¬ 
uation of an entity. An entity is a person, place, or object that is 
considered relevant to the interaction between a user and an applica¬ 
tion, including the user and applications themselves.” 

In general, your setting/context should be described as precisely as possible. 


context of use C GONTEXT 
GONTEXT = {real world, lab study, expert inspection, desktop PC, 
mobile phone, tablet PC, at day, at night, at home, at work, user is walking, 

user is sitting,...} 

^http://next. theguardian. com/blog/responsive-tELkeover/, retrieved January 25, 2015. 
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5 Case Studies 


5.1 Evaluation of a Search Engine Results Page 

For testing a research prototype in the context of my industrial PhD thesis, we have 
evaluated a novel search engine results page (SERF) designed for use with desktop 
PCs [in]. The test was carried out as a remote asynchronous user study with 
participants being recruited via internal mailing lists of the cooperating company. 
They were asked to hnd a birthday present for a good friend that costs not more 
than C50, which is a semi-open task (i.e., a do-goal). According to our above 
formalization of usability, the precise type of usability u assessed in that evaluation 
is therefore riven by the following (for the sake of readability, the quintuple is given 
in list form) In 

• level of usability metrics = in use 

• product = {web application, SERF} 

• users = {company employees, novice users, experienced searchers (several 
times a day), average age ~ 31, 62% male, 38% female} 

• goals = {formulate search query, comprehend presented information, identify 
relevant piece(s) of information} 

• context of use = {desktop PC, HD screen, at work, remote asynchronous 
user study} 

In case the same SERF is inspected by a team of usability experts in terms of 
screenshots, the assessed type of usability changes accordingly. In particular, users 
changes to the actual target group of the web application, as dehned by the co¬ 
operating company and explained to the experts beforehand. Also, goals must be 
reformulated to what the experts pay attention to (only certain aspects of a sys¬ 
tem can be assessed through screenshots). Overall, the assessed type of usability 
is then expressed by the following: 

• level of usability metrics = external 

• product = {web application, SERF} 

• users = {German-speaking Internet users, any level of searching experience, 
age 14-69} 

® As I have defined usability in terms of a quintuple and tuples are ordered lists of elements, the 
formally correct notation would be: u = (usability in use, {web application, SERF}, {company 
employees, novice users, experienced searchers (several times a day), average age ss 31, 62% 
male, 38% female}, {formulate search query, comprehend presented information, identify 
relevant piece(s) of information}, {desktop PC, HD screen, at work, remote asynchronous 
user study}). 
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• goals = {identify relevant piece(s) of information, be satisfied with presenta¬ 
tion of results, feel pleased by visual aesthetics} 

• context of use = (desktop PC, screen width > 1225 px, expert inspection} 


5.2 A New Usability Instrument for Interface Evaluation 


In m we describe the development of Inuit— a new usability instrument for in¬ 
terface evaluation. As has already been mentioned in Section [H Inuit comprises 
the seven items informativeness, understandability, confusion, distraction, read¬ 
ability, information density and reachability, which have been identihed as factors 
of usability in a conhrmatory factor analysis. Yet, while such a limited set of items 
also has its advantages, it narrows the types of usability that can be investigated in 
settings based on this particular instrument. Thus, the possible types of usability 
that can be evaluated are narrowed down as is explained in the following: 

• level of usability metrics: The instrument is not suited for evaluations based 
on internal usability metrics, as items such as, e.g., readability or distraction 
can only be meaningfully judged with respect to the rendered interface. Thus, 
in this case level of usability metrics G (external, in use}. 

• product: Using the instrument does not affect the types of products that can 
be evaluated, as long as they involve visual displays, which is a fundamental 
assumption in this technical report based on ISO 9241-11 |1]. Therefore, 
product C PRODUCT. 

• users: Using the instrument does not imply restrictions on the types of users 
an investigated interface targets. Therefore, users C USERS. 

• goals: As the instrument assesses seven specihc factors of usability, the in¬ 
vestigated goals are limited and directly dehned by the instrument’s items, 
i.e., “hnding a desired piece of information”, “understanding the presented 
information”, “not being confused” etc. and/or more hne-grained goals that 
are prerequisites for these (based on the specihc interface that is investi¬ 
gated). Moreover, the dimension satisfaction, which corresponds to goals 
such as “feeling satished”, is not considered by the instrument in accordance 
with |8]. Based on an assumption like “users are only satished when they 
found their desired piece of information”, one could still try to infer satis¬ 
faction from the given items. However, the instrument does not directly ask 
users whether they were satished. Therefore, goals C (hnding a desired piece 
of information, understanding the presented information, not being confused, 
not being distracted, ...}. 

• context of use: Using the instrument does not ahect the types of contexts 
that can be evaluated. Therefore, context of use C CONTEXT. 


6 Conclusion 


Usability is a term that spans a wide variety of potential manifestations. For ex¬ 
ample, nsability evalnated in a real-world setting with real nsers might be a totally 
different kind of nsability than nsability evalnated in a controlled lab stndy—even 
with the same prodnct. Therefore, a given set of characteristics must be speci- 
hed or otherwise, the notion of “usability” is rather meaningless due to its high 
degree of ambiguity. It is necessary to provide specihc information on hve ele¬ 
ments that have been identihed based on ISO 9241-11 pQ and ISO/IEC 25010 
level of usability metrics, product, users, goals, and context of use. This has been 
demonstrated in two case studies based on existing research. Although I have in¬ 
troduced a mathematically seeming formalism for characterizing the precise type 
of usability one is assessing, it is not necessary to provide that information in the 
form of a quintuple. Rather, my primary objective is to raise awareness for careful 
specihcations of usability, as many reports on usability evaluations—including the 
original version of my research paper |T0]— lack a complete description of what 
they understand as >usabihty<. 
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